{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting up the Working Directory\n",
    "This cell is to ensure we change the directory to anomalib source code to have access to the datasets and config files. We assume that you already went through `001_getting_started.ipynb` and install the required packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pathlib import Path\n",
    "\n",
    "from git.repo import Repo\n",
    "\n",
    "current_directory = Path.cwd()\n",
    "if current_directory.name == \"100_datamodules\":\n",
    "    # On the assumption that, the notebook is located in\n",
    "    #   ~/anomalib/notebooks/100_datamodules/\n",
    "    root_directory = current_directory.parent.parent\n",
    "elif current_directory.name == \"anomalib\":\n",
    "    # This means that the notebook is run from the main anomalib directory.\n",
    "    root_directory = current_directory\n",
    "else:\n",
    "    # Otherwise, we'll need to clone the anomalib repo to the `current_directory`\n",
    "    repo = Repo.clone_from(url=\"https://github.com/openvinotoolkit/anomalib.git\", to_path=current_directory)\n",
    "    root_directory = current_directory / \"anomalib\"\n",
    "\n",
    "os.chdir(root_directory)\n",
    "folder_dataset_root = root_directory / \"datasets\" / \"hazelnut_toy\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Folder Dataset (for Custom Datasets) via API\n",
    "\n",
    "Here we show how one can utilize custom datasets to train anomalib models. A custom dataset in this model can be of the following types:\n",
    "\n",
    "- A dataset with good and bad images.\n",
    "- A dataset with good and bad images as well as mask ground-truths for pixel-wise evaluation.\n",
    "- A dataset with good and bad images that is already split into training and testing sets.\n",
    "\n",
    "To experiment this setting we provide a toy dataset that could be downloaded from the following [https://github.com/openvinotoolkit/anomalib/blob/main/docs/source/data/hazelnut_toy.zip](link). For the rest of the tutorial, we assume that the dataset is downloaded and extracted to `../../datasets`, located in the `anomalib` directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pylint: disable=wrong-import-position, wrong-import-order\n",
    "# flake8: noqa\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "from torchvision.transforms import ToPILImage\n",
    "\n",
    "from anomalib.data.folder import Folder, FolderDataset\n",
    "from anomalib.data.utils import InputNormalizationMethod, get_transforms"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataModule\n",
    "\n",
    "Similar to how we created the datamodules for existing benchmarking datasets in the previous tutorials, we can also create an Anomalib datamodule for our custom hazelnut dataset.\n",
    "\n",
    "In addition to the root folder of the dataset, we now also specify which folder contains the normal images, which folder contains the anomalous images, and which folder contains the ground truth masks for the anomalous images.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "folder_datamodule = Folder(\n",
    "    root=folder_dataset_root,\n",
    "    normal_dir=\"good\",\n",
    "    abnormal_dir=\"crack\",\n",
    "    task=\"segmentation\",\n",
    "    mask_dir=folder_dataset_root / \"mask\" / \"crack\",\n",
    "    image_size=256,\n",
    "    normalization=InputNormalizationMethod.NONE,  # don't apply normalization, as we want to visualize the images\n",
    ")\n",
    "folder_datamodule.setup()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train images\n",
    "i, data = next(enumerate(folder_datamodule.train_dataloader()))\n",
    "print(data.keys(), data[\"image\"].shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test images\n",
    "i, data = next(enumerate(folder_datamodule.test_dataloader()))\n",
    "print(data.keys(), data[\"image\"].shape, data[\"mask\"].shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen above, creating the dataloaders are pretty straghtforward, which could be directly used for training/testing/inference. We could visualize samples from the dataloaders as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = ToPILImage()(data[\"image\"][0].clone())\n",
    "msk = ToPILImage()(data[\"mask\"][0]).convert(\"RGB\")\n",
    "\n",
    "Image.fromarray(np.hstack((np.array(img), np.array(msk))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Folder` data module offers much more flexibility cater all different sorts of needs. Please refer to the documentation for more details."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Torch Dataset\n",
    "\n",
    "As in earlier examples, we can also create a standalone PyTorch dataset instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "FolderDataset??"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To create `FolderDataset` we need to create the albumentations object that applies transforms to the input image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_transforms??"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_size = (256, 256)\n",
    "transform = get_transforms(image_size=256, normalization=InputNormalizationMethod.NONE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Classification Task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "folder_dataset_classification_train = FolderDataset(\n",
    "    normal_dir=folder_dataset_root / \"good\",\n",
    "    abnormal_dir=folder_dataset_root / \"crack\",\n",
    "    split=\"train\",\n",
    "    transform=transform,\n",
    "    task=\"classification\",\n",
    ")\n",
    "folder_dataset_classification_train.setup()\n",
    "folder_dataset_classification_train.samples.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at the first sample in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = folder_dataset_classification_train[0]\n",
    "print(data.keys(), data[\"image\"].shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen above, when we choose `classification` task and `train` split, the dataset only returns `image`. This is mainly because training only requires normal images and no labels. Now let's try `test` split for the `classification` task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Folder Classification Test Set\n",
    "folder_dataset_classification_test = FolderDataset(\n",
    "    normal_dir=folder_dataset_root / \"good\",\n",
    "    abnormal_dir=folder_dataset_root / \"crack\",\n",
    "    split=\"test\",\n",
    "    transform=transform,\n",
    "    task=\"classification\",\n",
    ")\n",
    "folder_dataset_classification_test.setup()\n",
    "folder_dataset_classification_test.samples.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = folder_dataset_classification_test[0]\n",
    "print(data.keys(), data[\"image\"].shape, data[\"image_path\"], data[\"label\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Segmentation Task\n",
    "\n",
    "It is also possible to configure the Folder dataset for the segmentation task, where the dataset object returns image and ground-truth mask."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Folder Segmentation Train Set\n",
    "folder_dataset_segmentation_train = FolderDataset(\n",
    "    normal_dir=folder_dataset_root / \"good\",\n",
    "    abnormal_dir=folder_dataset_root / \"crack\",\n",
    "    split=\"train\",\n",
    "    transform=transform,\n",
    "    mask_dir=folder_dataset_root / \"mask\" / \"crack\",\n",
    "    task=\"segmentation\",\n",
    ")\n",
    "folder_dataset_segmentation_train.setup()  # like the datamodule, the dataset needs to be set up before use\n",
    "folder_dataset_segmentation_train.samples.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Folder Segmentation Test Set\n",
    "folder_dataset_segmentation_test = FolderDataset(\n",
    "    normal_dir=folder_dataset_root / \"good\",\n",
    "    abnormal_dir=folder_dataset_root / \"crack\",\n",
    "    split=\"test\",\n",
    "    transform=transform,\n",
    "    mask_dir=folder_dataset_root / \"mask\" / \"crack\",\n",
    "    task=\"segmentation\",\n",
    ")\n",
    "folder_dataset_segmentation_test.setup()  # like the datamodule, the dataset needs to be set up before use\n",
    "folder_dataset_segmentation_test.samples.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = folder_dataset_segmentation_test[3]\n",
    "print(data.keys(), data[\"image\"].shape, data[\"mask\"].shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visualize the image and the mask..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = ToPILImage()(data[\"image\"].clone())\n",
    "msk = ToPILImage()(data[\"mask\"]).convert(\"RGB\")\n",
    "\n",
    "Image.fromarray(np.hstack((np.array(img), np.array(msk))))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "anomalib",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13 (default, Nov  6 2022, 23:15:27) \n[GCC 9.3.0]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "ae223df28f60859a2f400fae8b3a1034248e0a469f5599fd9a89c32908ed7a84"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
